

International Journal of Research and Reviews in Applied Sciences And Engineering (IJRRASE) Vol 8. No.1 –2016 Pp.204-210 ©gopalax Journals, Singapore available at : <u>www.ijcns.com</u> ISSN: 2231-0061

# FPGA BASED MODIFIED ARCHITECTURE OF FFT WITH REDUCED DELAY USING MACRO

S.Subathradevi<sup>#1</sup>, C.Vennila<sup>\*2</sup>, M.Lakshmiprabha<sup>#3</sup>, B.Logeshwari<sup>#4</sup>, S.Malathy<sup>#5</sup> <sup>#1</sup>Assistant Professor, Department of ECE, <sup>#2</sup>Professor, Department of ECE, <sup>#3</sup>Students, Department of ECE, <sup>#1, 2,3</sup>Anna University-BIT Campus, Tiruchirappalli, India. <sup>#2</sup>Saranathan College of Engineering, Panjapur, Tiruchirappalli, India.

**ABSTRACT-----The fast Fourier Transform** (FFT) is a critical block widely used in digital signal processing algorithm. With the advent of semiconductor processing technology in VLSI system, different approaches had been tried in order to optimize the algorithm for a wide variety of parameters such as area, power and speed. Resource efficient FFT processors have become common a requirement for high-speed devices and **OFDM** transceivers. Low arithmetic complexity and high speed of FFT/IFFT processor is required in many applications in **OFDM-based** wireless broadband communication systems. For this reason, it's essential to develop an optimum complexity design FFT/IFFT processor to meet the low power and real time requirements. In order to speed up the FFT computation we increase the radix, for reducing chip size. Simulation of design units is done in ISim Xilix ISE 8.1. The overall area and power is also reduced.

KEYWORDS— VLSI, FFT/IFFT, path delay, DSP.

## **I. INTRODUCTION**

To perform the frequency analysis of a

discrete time signal, we convert the time domain sequence into an equivalent frequency domain representation. Such a representation is given by the Fourier Transform. The Discrete Fourier (DFT) is one of the Fourier Transform used for transform, Fourier analysis. It transforms time domain signals into frequency domain signals. In an N point DFT, to evaluate each DFT sample value we have to perform N multiplications and N-1 additions using complex numbers. N such computations are required in all, therefore there will be N2 complex multiplications and N(N-1) complex additions. The FFT procedure for synthesizing and analyzing the Fourier series was given by Cooley and Tukey. It is a computationally efficient way to calculate DFT. The wide usage of DFT's in Digital Signal Processing applications is the motivation to implement FFT's. This method provides a divide and conquers approach to the computation of DFT.

There are two methods of FFT, Decimation in Time (DIT) and Decimation in Frequency (DIF). In DIT, the inputs are fed in bit reverse order and the outputs are obtained in normal order. In DIF, the inputs are given in normal order to the butterfly unit and outputs are obtained in bit reversed format.

## II. FFT/IFFT

To perform the frequency analysis of a discrete time signal, the block diagram of FFT architecture consists of five

basic blocks. A double port RAM memory to hold the values of input, output and intermediate operations, a Butterfly processing unit which consists of radix-2 butterflies and it is the heart of FFT algorithm, ROM memories to store twiddle factors and an address generation unit to extract data from RAM and ROM and finally a controller.



# Fig.1 FFT Block diagram

## A. RADIX-2 FFT

Radix-2 is the first FFT algorithm and was proposed by Cooley and Tukey in 1965. The DFT of a given sequence x[n] can be computed using the formula  $X[k]=\sum n=0$  to N-1W<sub>N</sub><sup>nk</sup> $\alpha$ , k=0,1,...,N-1.



Fig.2 Radix-2 Butterfly diagram

The properties of twiddle factor used are periodicity and symmetry. Then

X(k)=X1(k)+WNkX2(k)

X(k+N/2)=X1(k)-WNkX2(k) where, k=0,1,...,N/2-1.

N is a regular power of 2, the same If computational procedure can be applied recursively until the N-point DFT is evaluated as a collection of 2-point DFT's. In the general description of FFT, the whole FFT operation is partitioned into three processes: dataload. computation and result unload.Initially input datas are given to the processor by means of simulation inputs. These datas are written into the RAM. The processing cycle starts with data load process. Here we read the stored data from RAM and start signal for FFT becomes high. Then, FFT computation of stored data taken place. Finally the results are stored in RAM and becomes available at the output.RAM: The RAM memory carries different types of data as the computation of FFT algorithm proceeds. Initially the input datas for FFT computation are stored in RAM. After each butterfly operations it overwrites the input data positions. And finally during the output process bit reversed address is given to the RAM and it outputs the data accordingly. The dual port capability of RAM makes two data samples available at the same time. So, it decreases the overall computation time of FFT.



Fig.2 Radix-8 Butterfly diagram

# **IV. LITERATURE SURVEY**

The architecture of the FFT plays a vital role in the DSR processors. Novelty in its architecture towards the high speed, area consumption reduction, power is much important. In the paper[1] the modular approach is to develop parallel pipeline architecture. The operating frequency of this architecture can be decreased that in turn reduce the power consumption. In the paper[2], presents a novel power saving technique supported by two design models such as multi-transform model and multimedia functional unit for multimedia purposes. In the journal [3] speeds performance of our design easily satisfies most application requirement based on OFDM modulated wireless communication system. Butterfly Unit: The butterfly is the basic operator of FFT. It takes two data words from memory and computes the two point FFT. The input data are taken in 2's complement format. A butterfly is the basic operator of the FFT. It takes two data words from memory and computes the two point FFT. The input data are

taken in 2's complement format. A butterfly unit consists of (N/2) butterflies and each unit contains two ROM memories to store sine and cosine co-efficient of twiddle factors, four 16X16 bit multipliers, six 32-bit accumulators and two concatenation operators to get the correct data format at the output. The whole butterfly operation takes six instants of time. First, the two data inputs and the twiddle factor co-efficients are read(R). They are multiplied(x) to get the partial products in 16-bit format. Then, they are added and subtracted and truncated(+/-).



# Fig.3 Butterfly Unit

The truncated real and imaginary parts are concatenated(&) to get the 32-bit format. The adder used in butterfly computation is the carry select Adder(CSLA) which is one of the fastest adder. The main advantage is that the arithmetic operations follows pipelining operation and addition is done using CSLA, so it reduces the overall computation time.



# Fig.4.

Address Generator: The purpose of address generation unit is to provide the ROM and RAM memory with correct address. It also keeps track of which butterfly is being computed in which stage. For an 8-poin complex FFT, there are 3 stages, each stage consisting of 4 butterflies.

Since the address during input, output and FFT computation processes are different; it keeps track of the mode of operation of the chip and generates the required address. Mode of operation information is supplied by the controller. Controller it controls the whole activity of FFT computation. It act as a combinational state logic the required address.

Mode of operation information is supplied by the controller. Controller it controls the whole activity of FFT computation. It act as a combinational state logic.

## IV. PROPOSED ARCHITECTURE

The architecture of the FFT plays a vital role in the DSP processors. Novelty in its architecture towards the reduced delay, high speed and power consumption is much important.

In this paper they proposed a memory based recursive FFT Design which has less gate counts, lower power consumption and higher speed.[10].To reduce power consumption and chip area special current node SRAMs are adopted to replace shift registers in the delay line.[8].The modular approach is used to develop pipeline architecture. The operating frequency of this architecture can be reduced that in turn reduce the power consumption.[5].Efficient mulplication technique is used to reduce partial product for compute the DFT.[3].it uses power saving technique by two design novels(multi transform model and multimedia function unit) for multimedia purposes.[9].

# **IV. RESULTS**

# A.Synthesis Report (8-point-coding)

### DEVICE UTILIZATION SUMMARY:

| Selected Device                          | : | 3s500efg320-4       |  |
|------------------------------------------|---|---------------------|--|
| Number of Slices                         | : | 11 out of 4656 - 0% |  |
| Number of 4 input LUTs                   | : | 20 out of 9312 0%   |  |
| Number of los                            | : | 23                  |  |
| Number of bonded IOBs                    | : | 23 out of 232 9%    |  |
| TIMING SU                                | M | ARY:                |  |
| Speed Grade                              | : | -4                  |  |
| Minimum period                           | : | No path found       |  |
| Minimum input arrival time before clock  | : | No path found       |  |
| Maximum output required time after clock | : | No path found       |  |
| Maximum combinational path delay         | : | 9.141ns             |  |
|                                          |   |                     |  |

# **B.Synthesis Report**(16-point -coding)

| SUMMARY              |
|----------------------|
| : 3a500efg320-4      |
| : 38 out of 4656 0%  |
| : 68 out of 9312 0%  |
| : 287                |
| : 175 out of 232 75% |
| ARY                  |
| : -4                 |
| : No path found      |
| : No path found      |
| : No path found      |
| : 10.467ns           |
|                      |

## C.Simulation output (16-point - coding)



**D.Simulation output (8-point-coding)** 



**D.Simulation output (8-point-IP core)** 



# **E.Synthesis report(8-point-ipcore)**

#### SYNTHESIS REPORT FOR IP CORE(8-POINT FFT):

#### DEVICE UTILIZATION SUMMARY

| Selected Device            | ; 3s500efg320-4      |
|----------------------------|----------------------|
| Number of Slices           | : 393 out of 4656 8% |
| Number of Slice Flip Flops | : 481 out of 9312 5% |
| Number of 4 input LUTs     | : 426 out of 9312 4% |
| Number of Ios              | 1 55                 |
| Number of boaded 10Bs      | : 55 out of 232 23%  |
| Number of BRAMs            | : 3 out of 20 15%    |
| Number of MULT18X18SIOs    | : 3 out of 20 15%    |
| Number of GCLKs            | : 1 out of 24 4%     |

#### TIMING SUMMARY

| Speed Grade                              | ÷ | 4                                        |
|------------------------------------------|---|------------------------------------------|
| Minimumperiod                            | ÷ | 5.306ns (Maximum Prequency: 181.466MBtz) |
| Minimum input arrival time before clock  | τ | 4.794as                                  |
| Maximum output required time after clock | 1 | 4.880ms                                  |
| Maximum combinational path delay         | ÷ | 5.306ma                                  |

# **F.Synthesis Report**(16-point-ipcore)

#### DEVICE UTILIZATION SUMMARY:

| : 3s500efg320+4                           |
|-------------------------------------------|
| : 668 out of 4656 149%                    |
| : 831 out of 9312 8%                      |
| : 732 out of 9312 7%                      |
| : 90                                      |
| t 90 out of 232 38%                       |
| : 3 out of 20 15%                         |
| : 3 out of 20 15%                         |
| 1 1 out of 24 4%                          |
| UMMARY:                                   |
| : -4                                      |
| : 5.36608 (Maximum Prequency: 186.359MHz) |
| : 3.986ns                                 |
| : 4.965ns                                 |
| : 5.366tas                                |
|                                           |

### **G.Simulation output**(16-point-ipcore)



## **IV. RESULTS COMPARISON**

## **H.Simulation output**

| RADIX               | 8-PC    | DINT    | 16-POINT |         |  |
|---------------------|---------|---------|----------|---------|--|
| TYPE                | VERILOG | IP-CORE | VERILOG  | IP-CORE |  |
| NUMBER OF<br>SLICES | - 11    | 393     | 38       | 668     |  |
| NUMBER OF LUTS      | 20      | 426     | 68       | 732     |  |
| PATH DELAY          | 9.141ns | 5.306ns | 10.467ns | 5.366ns |  |
| AREA                | 51      | 1245    | 174      | 2132    |  |
| POWER.              | 0.081   | 0.081   | 0.081    | 0.081   |  |

### RESULT COMPARISON CHART:

### PICTOGRAPH



# **IV. CONCLUSION AND** DISCUSSION

In this work, I presented a novel architecture for the design of FFT which is being implemented on VLSI by having the target device as Spartan-3E with the title "FPGA based modified Architecture of FFT with Reduced delay using Macro'' which provides optimized combinational path delay analysis in this direction leads to get reduced area and power of FFT architecture.

## **V. FUTURE WORK**

Although this focus is on towards the minimized path delay in the architecture, the same can be achieved by implementing this design using any other reusability of sub-module which may also be controlled by routing algorithm. Obviously if the elay in the architecture is minimized power dissipation can also be reduced with the proposed architecture.

# **VI.REFERENCES**

- R. Ramachandran, J.Thomas Joseph Prakash, @FPGA based SoC for Railway Level crossing Management System'', International Journal of Soft Computing and Engineering(IJSCE) ISSN:2231-2307, Volume-2, Issue-3, July 2012.
- [2] Tabia Hossain, Syed Syed Shihab Uddin, Iqbalur Rahman Rokon, K.M.A. Salam, M. Abdul Awal," Prpficient FPGA Execution of Secured and Apparent Electronic voting Machine using verilog HDL", International Journal of Envirnment41):18-24(2014)
- [3] Mr.Harenndra Kumar Sharma Arvind Kumar Singh(MIEEE),"OFDM system using FFT and IFFT", volume 01,issue 2,January 2015,ISSN2394-3084.
- [4] M.Sheik Mohamed, K.Jaalal Deen, Dr.R.Ganesan,"VLSI based FFT processor with Improvement in Computation Speed and Area Reduction, Department of Electronics and Communication Engineering,Sethu Institute of Technology, Anna University, Tamilnadu, Chenai.
- [5] Naveen Motamari,"Higgh gain Narrow Band LNA Design for Wi-Max Applications at 3.5ghz'', Departmet of Electronics and Communication Engineering, National Institute of Technology, Rourkela, Oddisha, India-769 008, May 2014.
- [6] N.Kirubanandasarathy,G.PRamesh."Desig n and Analysis of an Area Efficient and Low Power New-R2MDC FFT for MIMO OFDM in wireless communication,"International Journal of Computer Science and Engineering Communications-IJCSEC, vol.2, Issue 4, July 2014.

- [7] Sergio Saponara, Massimo Rovini, Luca Fanucci, Athanasios Karachalios, George Lentaris, Dionysios Resis,''Design and Comparison of FFT VLSI Architectures for SoC Telecom Applications with Different Flexibility, Speed and Complexity Tradeoffs'', Circuits System and Signal Process(2012).
- [8] Dr.D.Rajaveerappa,K.Umapathy,''Low power and High speed 128-point FFT/IFFT processor for OFDM Applications'', IJCSI International Journal of Computer Science Issues, Vol.9,Issue 2, No.1,March 2012.
- [9] RPrashant, B.V.S.L. Bharathi,''A Lowpower VLSI Technique for Digital Signal Processing Portable Electronic Devices'',IOSR Journal of VLSI and Signal Processing(IOSR-JVSP) Volume 2, Issue 2 (Mar.-Apr. 2013), pp 20-24 e-ISSN:2319-4200, p-ISSN: 2319-4197.
- [10]K.Harikrishna, T.Rama Rao and Vladimir A.Baby,"Algorithm for OFDM based IEEE 802.16dFixedWiMax)Communications", Journal of Electronic Science and Technology, Vol.8, no.3, September 2010.
- [11]V.R.Gad, R.S. Gadand G.M. Naik,"Implementation of Gigabit Ethernet Standard using FPGA",International Journal of Mobile Network Communications &Telematics(IJMNCT),Vol.2,No.4, August 2012.
- [12]Unnati C.Mehta, Mr.Satyedra Sharma,"VLSI Implementation of 2048 point FFT/IFFT for Mobile Wi-MAX", Department of Electronics and Communication Engineering, NOIDA Institute of Engineering and Technology.
- [13]Rafidah Ahmad Othman sidek, Shukri

korakkottil Kunhi Mohd,"Implementation of a verilog-Based Digital Receiver for 2.4GHz Zigbee Applications on FPGA",Journal of Engineering Science and Technology vol.9, No.1(2014)136-153.

- [14]K.Maharatna, E.Grass and U.Jagdold,"A Low –Power 64-point FFT/IFFT Architecture for Wireless Broadband Communication",Department of Systems Design IHP-GMBH Technology Park 25, D-15236, Frankfurt(Oder), Germany.
- [15]Murtuza Jeeranwala, Dr.Shruti Oza,''Implementation of Efficient 64-Point FFT/IFFT Block for OFDM Transreceiver of IEEE 802.11a'',Volume 3,Issue 5, May 2014.
- [16]R.Seshadri,Dr.S.Ramakrishnan,G.Hemalath a,Vijayalakshmi,''Spuriou s Power Suppression Technique for VLSI Architecture'', Indian Journal of Computer Science and Engineering(IJCSE).